Empirical results

In this page, we will present the empirical results of our study on graph neural networks for molecular property prediction. We will first describe the datasets used in our experiments, followed by a comparison of mean vs max readout, an analysis of test accuracy vs number of parameters and train time, and an examination of test accuracy vs homophily. We will then present the results of our experiments with various pooling methods and architectures, and discuss the best architecture per pooling and the best pooling per architecture.

Our Datasets

We used four datasets for our experiments: MUTAG, PROTEINS, ENZYMES, and NCI1. The table below summarizes the key characteristics of these datasets.

MUTAG PROTEINS ENZYMES NCI1
Number of graphs 188 1113 600 4110
Number of classes 2 2 6 2
Number of features 7 3 3 37
Homophily 0.721 0.657 0.667 0.631

Some examples of graphs from the MUTAG dataset are shown below.

Some graphs from MUTAG dataset

Some graphs from MUTAG dataset (Source: bui2022ingrex)

Mean vs Max Readout

We compared the performance of mean and max readout functions for GNNs using Wilcoxon tests. The table below shows the p-value, mean difference, and best architecture for each dataset.

p-value Mean difference Best architecture
MUTAG 0.258 -0.008 GINConv_EDGE_max
PROTEINS 0.33 0.009 GCN_EDGE_max
ENZYMES 0.207 -0.01 GINConv_EDGE_mean

Since the p-values are greater than 0.05, we conclude that the results are equivalent between mean and max readout. Therefore, we decided to use only the global max pooling in our experiments.

Test Accuracy vs Number of Parameters on MUTAG

The plot below shows the test accuracy vs. the number of parameters for various GNN architectures on the MUTAG dataset.

Test accuracy vs Train time on MUTAG

Test Accuracy vs Train Time on MUTAG

The plot below shows the test accuracy vs. train time for various GNN architectures on the MUTAG dataset.

Test accuracy vs Train time on MUTAG

Test Accuracy vs Homophily

The plot below shows the test accuracy vs. homophily for various GNN architectures on the four datasets.

Test accuracy vs Homophily

Results by Pooling

The table below shows the results of our experiments with various pooling methods for GNNs.

ENZYMES MUTAG NCI1 PROTEINS Train time
EDGE GCN 0.294 ± 0.026 0.703 ± 0.081 0.717 ± 0.015 0.753 ± 0.024 1327
GIN 0.353 ± 0.039 0.847 ± 0.063 0.735 ± 0.010 0.731 ± 0.017 1156
MEWIS GIN 0.309 ± 0.055 0.789 ± 0.077 0.744 ± 0.006 0.743 ± 0.016 4365
None GCN 0.316 ± 0.044 0.703 ± 0.065 0.651 ± 0.015 0.743 ± 0.029 40
GIN 0.327 ± 0.042 0.803 ± 0.068 0.734 ± 0.018 0.733 ± 0.028 59
SAG GAT 0.189 ± 0.025 0.676 ± 0.062 0.617 ± 0.024 0.722 ± 0.050 112
GCN 0.195 ± 0.033 0.682 ± 0.073 0.630 ± 0.021 0.689 ± 0.041 53
GIN 0.188 ± 0.040 0.761 ± 0.081 0.639 ± 0.036 0.714 ± 0.039 59
TOPK GAT 0.208 ± 0.054 0.689 ± 0.093 0.623 ± 0.045 0.682 ± 0.033 110
GCN 0.176 ± 0.035 0.739 ± 0.075 0.631 ± 0.034 0.694 ± 0.032 55
GIN 0.205 ± 0.056 0.761 ± 0.079 0.617 ± 0.033 0.697 ± 0.027 56

Best architecture per pooling:

Dataset ENZYMES MUTAG NCI1 PROTEINS
EDGE GIN GIN GIN GCN
MEWIS GIN GIN GIN GIN
None GIN GIN GIN GCN
SAG GCN GIN GIN GAT
TOPK GAT GIN GCN GIN

Results by Architecture

The table below shows the results of our experiments with various GNN architectures.

ENZYMES MUTAG NCI1 PROTEINS Total Time
GAT MEWIS \(0.295 \pm0.040\) \(0.742 \pm0.086\) \(0.693 \pm0.008\) \(0.722 \pm0.022\) 3225
None \(0.310 \pm0.053\) \(0.679 \pm0.087\) \(0.659 \pm0.023\) \(0.734 \pm0.027\) 90
GCN EDGE \(0.294 \pm0.026\) \(0.703 \pm0.081\) \(0.717 \pm0.015\) \(0.753 \pm0.024\) 1327
None \(0.316 \pm0.044\) \(0.703 \pm0.065\) \(0.651 \pm0.015\) \(0.743 \pm0.029\) 40
TOPK \(0.176 \pm0.035\) \(0.739 \pm0.075\) \(0.631 \pm0.034\) \(0.694 \pm0.032\) 55
GIN EDGE \(0.353 \pm0.039\) \(0.847 \pm0.063\) \(0.735 \pm0.010\) \(0.731 \pm0.017\) 1156
MEWIS \(0.309 \pm0.055\) \(0.789 \pm0.077\) \(0.744 \pm0.006\) \(0.743 \pm0.016\) 4365

The best pooling method for each architecture is shown in the table below.

Dataset ENZYMES MUTAG NCI1 PROTEINS
GAT None MEWIS MEWIS None
GCN None TOPK EDGE EDGE
GIN EDGE EDGE MEWIS MEWIS